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Abstract 

We provide a coupling proof that the transposition shuffle on a deck of n cards is 
mixing of rate Cn log n with a moderate constant, C. This rate was determined by 
Diaconis and Shahshahani, but the question of a natural probabilistic coupling proof 
has been missing, and questions of its existence have been raised. The proof, and indeed 
any proof, requires that we enlarge the methodology of coupling to include intuitive 
but non-adapted coupling rules, because a typical Markovian coupling is incapable of 
resolving finer questions of rates. 

1 Introduction. 

Random shufflings of a deck of n distinct cards are well-studied objects, and a frequent 
metaphor describing a class of Markov chains invariant with respect to the symmetric group, 
Sn- Here the focus is on transposition shuffling, one of the simplest shuffles, defined by 
uniformly sampling the deck twice with replacement, and then interchanging the positions 
of these cards, if they are different. 

Clearly, as it is always the case for mixing finite state Markov chains, the distribution 
of the ordering of the deck converges in total variation to the invariant measure, which by 
symmetry is uniform on the permutation group, Sn- It is the rate of mixing which presently 
holds our interest, as well as the coupling methods by which one might attempt to show 
good upper bounds on this rate. 

This is a well-defined problem in probability theory, and one would expect that a coupling 
argument would be the instrument of first choice. Indeed, there is such an approach, given 
in the online notes of Aldous and Fill ^ (also see [T]). This method gives a rate of 0(?t,^), 
and will be discussed later in the article. Unfortunately, but necessarily, this rate is not 
the optimal rate [we expect, 0(r;, log n), which was proven by Diaconis and Shahshahani [7] 
using methods from the relatively rarified mathematical residential district of representation 
theory]. This gap is apparent, and somewhat long-standing; indeed, Peres has listed the 
problem of showing the O(nlogn) as the rate of uniform mixing, using a coupling approach. 
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as one of a number of interesting open problems (see [H]). The problem is also mentioned in 
Saloff-Coste p[2] and other important publications. Here we solve this problem. Moreover, 
we deconstruct different ways of looking at this kind of problem, and enlarge the intuitions 
that might guide us in coupling. 

We would like to mention that Matthews gives a purely probabilistic proof of the 
0(?T,log?T,) bound using strong stationary times. Diaconis and Saloff-Coste |6] also treat 
a host of other transposition problems, and a survey of the many appearances of random 
transpositions can be found in Diaconis [S] . Among the publications related to this research, 
we would like to list the recent work of Berestycki and Durrett [3J on random transposition, 
and the paper of Hayes and Vigoda on a non-Markovian coupling for graph coloring problems. 

1.1 Preliminary details. 

Suppose that the number of cards n is fixed and define S = {0,1, . . . ,n — 1}, and supppose 
S"^ is the state space of a sequence of iid random variables {r^ : t G Z} which are uniformly 
distributed on S^. We will also assume, or construct, a bijection from to the set of 
mechanisms creating a sequence of transpositions (or identity) on S. Each Tt thus produces 
independent uniform transpositions, although there is a probability of ^ that there will be no 
change in the ordering of the deck. Given an initial probability distribution on Sn, we let the 
law of the distribution on Sn be actuahzed by a random permutation Xt describing the images 
of cards 0, 1, . . . , n — 1 in order. This gives a well defined Markov Chain. It will sometimes be 
useful to Possionify the Markov Chain to continuous time with iid exponentially distributed 
interarrival times. In other words, occurs at a time tm = Ti + T2 + ■ ■ ■ + where are 
iid mean one exponentially distributed random variables. We denote the law of the totality 
of this setup by V, and if A is the distribution of the initial configuration we denote the law 
of this conditional distribution by V\X. 

A coupling argument would require the construction of a joint distribution on the product 
of V and V = (V\6id), where 6id denotes the point mass distribution on Sn, giving the identity 
probability 1. This joint distribution must have marginal distributions that agree with V 
and V and so that the random permutations Xt and Xt agree from some time T onward. 
Notice that the shuffle is invariant with respect to the symmetric group in the following 
sense: 

Definition. Suppose that X^ is a Markov Chain on Sn- The chain is said to be group 
invariant if dist{'yXm^i\Xm = a) is equal to the dist{Xm+i\Xm = 7"^^) for all 7, a G Sn- 

This is a homogeneity condition similar to that of independent increments, and says that 
the shuffle is independent of the values or printed labels on the cards. It implies that the 
distribution of cycle structures of Xi given Xq = a depends only on the cycle structure of 
a. The set of group elements with a given cycle structure form a conjugacy class in the 
group, Sn- For example, if a is a transposition, i.e. a 2-cycle, then regardless of what a 
is specifically it has the same coupling and transition structure as any other transposition. 

Since the identity transitions to each of the ( ^ ) transpositions with equal probability and 
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remains at the identity with probabihty ^, we could analyze this process as a Markov chain 
on the conjugacy classes, i.e. classes of permutations with the same cycle structure. Further, 
this random walk on the cyclic decompositions is biased toward intermediate central weights 
of the permutations (where the weight is equal to the sum of the cycle lengths minus the 
number of cycles, and equal to the minimum number of transpositions required to reduce the 
permutation to the identity). We could have used this approach to analyze recurrence rates. 
The number of distinct cycle structures is equal to the partition function on the integers 

P(n) which grows exponentially with rate -y/^ much slower than the rate of nlog(n) that 
n! grows with. 

There are many ways to associate an element < a, b > in 5^ with a transposition. Each 
card has a value or label printed on it, and each card has a location: the number of cards 
above it on the deck plus 1. Let Q be the set of values or labels, and let P be the set of 
positions of the cards. If < a, b > is a member of P x P then we associate < a, b > with 
interchanging the locations of a card at location a and a card at location b. If < a, b > 
is a member of Q x P (in which case we will use < [a], b > notation), then we associate 



< [a], b > with taking card [a] and placing it at location b, and then using the card that 
was in location b to replace [a] in its original location. When we begin to couple, we will be 
following the motion of two decks. It is natural to apply the same operation to each one of 
the two decks. For example, it we did this with Q x P association for evolution, we could 



do the same transposition < [a],h > for both to eventually obtain coupling in O(ra^), as 



in Aldous and Fill [2]. We however have much more flexibility than that with a bijective 
map from Q x P to Q x P, called an association mapping to relate the coupled moves of 
each deck. An association mapping tells us how to couple the immediate descendants of two 
group elements that are coupled. Recall that we must preserve relations, like siblings and 
cousins in the tree. 

Most coupling arguments are made up of present and past measurable constituents; that 
is the coupling method is adapted to an increasing sequence of a-fields so that the present 
and past are measurable with respect to their corresponding a-fields, and that the a-fields 
have enough extra randomness to perform independent experiments, subdivide atoms, and 
so on. In some ways this tendency toward adaptive coupling is historical, and in some ways it 
is natural to follow one's intuition, and then make it rigorous. The first coupling arguments 
most people see is a passive coupling in which two Markov chains are allowed to go their own 
ways independently of each other until they happen to obtain the same state at the same 
time. From this random time onward they are coupled together. It is also true that if we had 
perfect knowledge of the situation, we would be able to make the optimal coupling at any 
time. Unfortunately, the numbers are usually too large and the relationships too complex. 
The point here is that non-adapted coupling can be natural, intuitive, and have great power 
in solving problems. Indeed, as we shall see, we can even couple so that we anticiplate the 
future and prepare for it, while maintaining the only essential ingredient of coupling, that 
of having perfect (or near perfect) distributions on the marginal processes. Moreover, it is 
possible to have couplings made up of surgically cut and pasted pieces of sample path. 
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1.2 Strong Uniform Mixing, Weak Bernoulli, and Coupling. 

If Xt and Yt are stochastic processes, then a couphng of Xt and Yt is a joint probabihty 
distribution on the product, {Xt, Yt) so that the marginal distributions on X and Y agree 
with original distributions. We say that Xt{uj) and Yt{u!) are coupled at the random time T 
if for t > T it is the case that Xt = Yt. If T is finite a.s., then this argument shows that the 
distributions of {Xt,t > k} and {Yt,t > k} are converging in the total variation distance as 
k becomes large, and hence have convergent distributions. 

Diaconis and Shahshahani [7j define a finite state Markov chain X with invariant probabil- 
ity U to be strong uniform mixing if there is a stopping time T, so that P[T = k, X^ = a] 
is independent of the group element a. Because of this and the invariance of the uniform 
distribution on the group G, for any a we have P[Xk = (y\k > T] = 1/\G\, as the invariant 
measure is uniform on G. Regardless of the distribution of Xq, we have the following bound 
on the total variation norm, \\U — dist{Xk)\\TV ^ P[T > k], because this is the only part 
of the probability space where the total variation is not forced to be 0. The rate of mixing 
is carried by the distribution of the stopping time. Coupling arises because the total varia- 
tion norm is achieved by joining the distributions together in a probability preserving way. 
The process is Markov, so it makes sense for the definition to be independent of the intial 
state. Coupling for general processes is usually connected to weak Bernoulli, which has a 
rich history (a.k.a. absolutely regular and /?— mixing [T3]). 

Definition. A finite-valued stochastic process Xm is weak Bernoulli if there is a 
coupling {(X^,X^) : m E Z} such that (i) {X'} and {X"} have the same distribution as 
{X}, (ii) the past of X", {X'^ : m = 0, —1, —2, . . .} is independent of {X'}, and (iii) there is 
a random variable T so that m >T implies that = X^. 

In this case, the future becomes independent of past values in a strong way. The pathwise 
coupling version of weak Bernoulli is 

Definition. A finitely valued stochastic process X^ is tree weak Bernoulli if there is 
a coupling {(X^,X^) : m G such that (i) {X'} and {X"} have the same distribution 
as {X}, (ii) the past of X", {X'J^ : m = 0, —1, —2, . . .} is independent of {X'}, (iii) there 
is a random time a.s. finite time T so that m > T implies that = X^, and (iv) the 
coupling respects the tree structure of future sample paths, so if X' and X" are coupled at 
time m > then each of their descendents (or successors) are also coupled together, i.e. the 
coupling is given by a tree automorphism of the branching future paths. 

The terminology comes from Hoffman and Rudolph [9] , in which they used tree very weak 
Bernoulli to study isomorphisms of 1 to p endomorphisms. The conditional distributions of 
{Xm, m E [1,2,..., M]} live on the set of labeled trees of length M. We raise the definition 
because many, if not most, coupling methods have this property of being tree consistent. 

When applying tree weak Bernoulli or tree coupling, it is important to consider the 
events defined by the finitely valued random variables to be atomless, e.g. a subset of the 
unit interval. Achieving optimal total variation norm typically requires subdividing these 
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events into events of smaller but arbitrary probability. 

In any given situation, it is possible, in principle, to find the optimal coupling because 
the total variation norm is achievable. However, the size of the state space and dependence 
within the process often make this impractical. 

We developed methods to speed up the coupling time while maintaining the distributions 
on the marginal processes. The methods rely on intuition and insight , and were usually 
dynamical. By dynamical, we mean that the coupling at a given time depended upon the 
process up to that time, and, perhaps, some external sources of independent randomness. In 
other words, the coupling was adapted to the pair of processes as they were being constructed, 
unlike uniform strong mixing, tree weak Bernoulli is pathwise and needs a probability- 
preserving path-wise isomorphism between the future trees of possibilities of the processes. 

To illustrate the difference between uniform strong mixing and tree weak Bernoulli sup- 
pose we have a Markov Chain with state space oi, 02, 61, &2, and suppose the only allowable 
transitions, each with conditional probability one-half, are: 

ai a2, ai bi, 02 bi, 02 &2, ^'i oi, ^'i 02, ^'2 oi, ^'2 b2. 





0,1 


a.2 


bi 


b2 










1/2 


1/2 


a2 


1/2 





1/2 





bi 


1/2 


1/2 








b2 





1/2 





1/2 



The matrix is doubly stochastic, so the invariant probability is uniform on each state. 
Suppose we have two Markov chains with this law, and initial states Xq — ai, and Yq = 62- 
Possible forward paths are 



oi 62 
/ \ / \ 

a2 bi and ai 62 

bi 62 Qi ^2 02 61 Oi 62 

The distribution of X2 and Y2 are equal and at equilibrium, \\dist{X2) — dist{Y2)\\ — 
\\dist{X2) ^ U\\ = while \\d.ist{Xi) - dist{Yi)\\ = 1 so Xi 7^ Yia.s. On the other hand, 
any tree couphng of {Xq, Xi, X2) with (10,^1,^2) has P[X2 = 1^2] = |- 

Similarly, the tree weak Bernoulli, or tree coupling coefficients of a process, may be 
different than the uniformly strong mixing coefficients, although the coupling distance is 
never less than the total variation distance. 



2 Transposition shuffling for the case n = 3. 

The models we deal with here are invariant random walks on a group in which each step has 
the same probability, although the case that different steps will go to the same location is 
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allowed and expected. The set up is a Markov Chain that is itself a hidden Markov Chain 
with the same number of equally likely outcomes at any stage. 

We begin with the group S3. We will make some comparisons between two processes, 
(Xf) where Xq = (123), the identity and, Yt. Consider the adjacency matrix, M, 





123 


231 


312 


132 


213 


321 


123 


3 








2 


2 


2 


231 





3 





2 


2 


2 


312 








3 


2 


2 


2 


132 


2 


2 


2 


3 








213 


2 


2 


2 





3 





321 


2 


2 


2 








3 



If M is the matrix shown then the (1/9)M is the stochastic matrix that represents the 
dynamics of the chain. 

The square of M is = 9/ + 12iV where N is the 6 by 6 matrix consisting of all I's. The 
maximum total variation distance between any starting point and any other starting point 
(2112)/81 = 9/3^ = 1/9. If we raise M to the 2m power we get M^"* = (9 * 7 + 12 * N)"^ = 
+ bmN, and continuing this calculation shows that the difference between the (1,1) entry 
and the (1,2) entry is 9™ = 3"^"^. This means that after 2m iterations the total variation 
norm between any two distinct starting places is 1/3^™, so the mixing rate is geometric with 
error of ^ Jam ; and the mixing rate is 1/3. The 1/2 comes from reducing the total variation 
distance between distinct starting places to that of one starting permutation and the uniform 
distribution. 

Since every odd power of M is the product of an even power of M with M, we can similarly 
compute the total variation distance in this case. An analogous calculation gives the total 
variation distance between Xm and the uniform U as 

\\dist{Xm)) - U\\ TV 

So for all m we have 

^■S-"'<\\dlSt{Xm)-U\\TV<^-^-"' 

Using the Q x P association (picking a face value and then picking a location to swap 
cards with) there are 9 equal possibilities. If Xq = 123, the identity then Xi has nine equal 
outcomes, three of which are 123, and there are two chances for each of three transpositions, 
132, 213, and 321. These are shown in the chart below. In keeping with group invaricncc, 
the possible successors of any permutation are three chances to remain in its previous state 
and two chances for each of two group elements of opposite parity. 



Q 


\ 


P 


1 


2 


3 




1 




123 


213 


321 




2" 




213 


123 


132 




3 




321 


132 


123 



6 




m odd , 
m even . 
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On the other hand, if we wish to couple with the path arising from 132 using group 
invariance, we get the following table: 



Q\ 


P 


1 


2 


3 




1 




132 


312 


231 




2 




312 


132 


123 




3 




231 


123 


132 



Looking at the diagram, we see that four out of nine outcomes can be perfectly cou- 
pled, and that the remaining line up as transpositions of each other. Actually, we can do 
a bit better by coupling two that differ by a 3-cycle and then four that couple as transpo- 
sitions. Specifically, couple two 123s with the corresponding 123s, the two 132s with the 
corresponding 132s, the remaining 123 with 312, and 213 with 132, and then the remaining 
three couple in any way with the corresponding remaining 3 outcomes. After drawing some 
infinite trees, and calculating many geometric series one arrives at the optimal tree coupling 
which is m3~"^. Note that this coupling is strictly larger, leaving open the possibility that 
tree coupling may be unable to achieve the optimal mixing rates. As an aside, note that 
the trajectories of the process are given by a two dimensional substitution system given by 
iterating the three by three diagrams above. 

3 Coupling to the future 

Consider a continuous-time process {Xf, Yf) on 5* = {0, 1} x {0, 1, 2} with generator 





(0,0) 


(0,1) 


(0,2) 


(1,0) 


(1,1) 


(1,2) 


(0,0) 


-12 








10 


1 


1 


(0,1) 





-12 





1 


10 


1 


(0,2) 








-12 


1 


1 


10 


(1,0) 


10 


1 


1 


-12 








(1,1) 


1 


10 


1 





-12 





(1,2) 


1 


1 


10 








-12 



Here the first coordinate fiips Xt ^ 1 — Xt with rate 10, while the second coordinate 
switches Yt ^ (Yt — 1) mod 3 or Yt ^ {Yt + 1) mod 3 with rate 1 each. However 
every time the second coordinate changes, the first coordinate must also change. 

We want to find a fast coupling for the above process. Coupling the first coordinate is 
simple: wait with rate 20, then assign with equal probabilities of | either or 1 to both Xt 
and X^. Similarly, coupling the second coordinate is also simple: wait with rate 3 before 
assigning any one of the three values 0, 1, or 2 to both Yt and Y^'. However coupling both 
coordinates simultaneously for the processes (Xf, Yf) and (Xj., Y"/) may create the following 
complication: let Ti be the exponential r.v. with parameter 20, and T2 be the exponential r.v. 
with parameter 3. With large probability, the first coordinates will couple before the second, 
i.e. Ti < T2. In two out of three cases, either Yt or Y^ will not change at T2. Therefore in 
two thirds of the cases, when we couple the second coordinate, we simultaneously decouple 
the first. Thus we will need extra time for the first coordinates to couple again. 
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Coupling to the future works in the following way. We start by generating Ti and T2. If 
Ti < T2, we will start by deciding the value for Yt^ = Y^^^. If it is different than both Iq and 
Fq, then we need to randomly generate Xt-^ = X^_^, and we are done, the process is coupled 
at T2. However, if Ti < T2 and It'2 = matches either Yq or Yq, we will need to pair Xt 
and X^ differently. Namely we will need to couple and 1 — X^_^ , and keep Xt = 1 — X'^ 
paired until time T2. Therefore at T2, when the second coordinates Yt and Y"/ couple, one of 
the first coordinates must also flip, thus coupling the two processes, {Xt^Yt) and (Xj',y'/). 
In any situation, the coupling time will be max{Ti,T2}, while before it was larger. 

Looking into the future of Yt and F/, we decided whether to pair Xt with X't or Xt with 
1 — X'f We will call this pairing an association map. 



4 Super-fast coupling. 



Consider an n-card deck. Recall that the random transposition shuffle occurs by making 
two independent uniform choices of cards, and interchanging them. We assume time is 
continuous, so the transpositions happen one at a time with exponential times of rate one 
in between 



I.e. 



and b 



transposition < 



> has rate 

b > and < b , [a] > happen with rate each. 



for each pair of cards 

i.e. the two identical transpositions < 
Transposition < [a], [a] > has rate Diaconis and Shahshahani used group representation 
methods to show that the mixing time for this shuffling process is O(nlogn) (see [7] and [1]). 
In [2J the possible coupling approach was shown to produce upper bound = O(n^), while [TT] 
lists showing 0{n logn) mixing time via coupling construction as an open problem. Here we 
solve it. 



4.1 Notations and vocabulary. 




Notations and Vocabulary 



- transpositions in the card shuffling process 

- transpositions initiated by card 

- the top shuffling process 

- the bottom shuffling process 

- the coupled process 

- transpositions in the top shuffling process At 

- transpositions in the bottom shuffling process Bt 

- simultaneous transpositions in the coupled process 



At 
Bt 



- hidden association between positions/locations in the top pro- 
cess and positions/locations in the bottom process that will be 
used to establish the rates for the coupled process 
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4.2 Label-to-location and label-to-label transpositions. 



One of the possible coupling constructions was described in [2]. There, at each step, a card 
"a] and a location i were selected at random, and the transposition -C 
card [a] to location i was applied in both processes. At and Bf. 



, i ^ that moves 



That is we used Q x P 
association for both decks. Clearly, this coupling slows down significantly when the number 
of discrepancies is small enough, thus producing an upper bound of order O(n^), instead 
of O(nlogn). An equivalent result can be achieved by first randomly selecting a card 

i ^ with location i selected at random, if 
^ of cards 



and then applying transposition ^ 
not coupled, and applying transposition <^ 
for a randomly selected card 



IS 



and b 



b, ^/ 



in both processes, 
is at the 



is coupled, i.e. in the situation when 
same location for both processes. At and Bt. The second coupling is important, as it can be 
improved to match the correct O(nlogn) order for mixing time. The improvement comes in 
the form of a combinatorial trick similar to the one used by Euler in computing the number 
of permutations of n elements with all elements displaced. 

From here on transpositions <^ [al, b ^ will be called label-to-label transpositions. 



4.3 Group invariance. 



Without loss of generality, we would like to state that, in our coupling construction, when- 
ever a coupled card, for example _b_, is selected, and thus a transposition ^ _b_,[a] ^ with 
another card is to be applied, we actually need not do this transposition, as the resulting 
discrepancies will be the same (with different card values, of course) as before the transpo- 
sition. This is because they have the same cycle structure, and so are conjugate. Moreover, 
if the card [a] is not coupled, and is waiting for the transposition ^ 

will be uncoupled, and must wait for ^ 



^ the card b 



random i, a.s i has not been used yet. If we ignore transposition -C 
continue waiting for <C 



i then after 
i ^ with the same 
then 



>, 



will 



z 3> as if no changes have happened. If a is a permutation of face 

a; 



values on the cards, then from our point of view, the coupled process 
crAt 



crBt 



because of group invariance of the process. Also, if 



is equivalent 
is uncoupled, and -C 



t > 



is to occur, where site i is occupied by the same 
sary (relabeling is optional). If we do not relabel, the card 
new i, and wait for -C 



in both processes, no relabeling is neces- 
will have to reselect a random 



i ^ to happen. The key point is that the situation is invariant 



under label-to-label transpositions. 



4.4 Improving the coupling. 

The basic strategy of the coupling described in this paper is to condition on a key a-field in 
the future and then to use this future information to arrange the intermittent events so that 
the process is set up for a successful coupling event. This is completely legal as long as we 
take care to have the marginal stochastic processes maintain the correct finite-dimensional 
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distributions. If this is followed, we are able to modify the joint distributions of the processes 
however we like to obtain our goal. 



Example. To illustrate the forthcoming construction (and our notation), suppose we have 
four cards that are paired so they have two discrepancies at time to = {d = 2). 



deck A : 


1_ 




2_ 




3_ 




4_ 


deck B : 


T 




3" 




2 




T 


location : 


1 


2 


3 


4 



We pick a random uniform location ii e {1, 2, 3, 4} and an exponential time ti. 



Conditioned on the event of card |_2j jumping to ii (i.e. << >>) at time ti, we 

provide the following coupling rules for time t e [0, ti] . 

Case I: ii = 2 or 3. 

Here cards [T], [3] and ^ do label-to- label jumps only, and each of these jumps leads to card 



setups that are isomorphic to the original setup, up to relabeling the cards. Because of this we 
suppress noting any change at all. To illustrate this, a label-to-label jump << \T\, [s] >>will 
take 

deck A: ^ g 

deck B : g] □ g 

This latter set up is equivalent to the former setup, so this case leads to no change. At 
the end, card _2_ does the label-to- location jump << 2_,ii » at time ti, cancehng the 
discrepancies. 



deck A: {T} {2} 


3 




4 


deckS: ^ 


2 




4 



to 



Case II: ii = 1 or 4, one of the non-discrepancy locations. Again it suffices to only consider 
ii — A. Here are the rates for this case: 

• Cards \T\ and _3_ do label-to-label jumps, and as in Case I, no real change occurs and 
we again suppress any notational changes. 



On the other hand if we pick card [4j, then we couple both decks together as follows: 
either we interchange cards _^ and _3_ on the top (< _3_ >a), getting 



deck A: {T\ {2} 


4 


3 


deck B : 


2 




4 



The other possibility is we interchange [T| and [s] on the bottom (< _3_ >b) to get 



deck A: 
deck B : 
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Once one of the above two transpositions occurs, card ^ joins |T] and _3_ as one of the 
cards that does label-to-label jumps only which will be unnoticed for the rest of the 
time in [0, ti]. 

The other two options for card ^ are label-to-label transpositions << [4], [T] 

« 



4 , 2 >> getting 



deck A: ^ 
deck B : \4\ [s] \2] \l] 



and 



deck A: ^ 
deck B : 0^0^ 



respectively, which we suppress noting. The rates being the same for each one of 
< [T|,[3] >A, < |T|,[3] >B, << |T|,|T| >> and << |T|,[2] >>. 



• Card |_2j docs the label to location jump << [2j, ii >> at time ti. 
The point is that in Case II, either of the following transposition sequences 

< 



4 , 3 >A followed by << 4 , 2 



or 



< 



4J, [3J >B followed by < < |_4j, [2 



>> 



>> 



would lead to the discrepancies' cancelation. 

If at time ti the discrepancies are not canceled out, start anew with a new random ii 
and exponential ti. There are two possibilities at these trials: either we end up with no 
discrepancies or we end up in the same boat as before and we try again. Here we set the 
coupling rules by condition only on one event, << 2_,ii ». If wc condition on more than 
one upcoming events (say < _3_ >a in Case II), by conditioning inside the conditioning, 
we can increase the probability of coincidence, thus producing a faster coupling time. Later 
in the paper we will deal with conditioning on a chain of events. 

The above example is a simplified version of the coupling construction to follow. 

4.4.1 Two discrepancies and one association map {d — 2,k — 1). 

We will start with the case of two discrepancies {d — 2): that is, when the coupling speed is 
slowest. As previously implied, there are only two discrepancies, at sites di and 



A: 
Bt- 



6 



b 


9 




a 




8 




al 


a 


9 




b 




8 




al 


T 




T 


T 


d2 






k 



Due to the "group invariance" that was mentioned before, the above picture does not change 
until either or _b_ jumps to either di or ^2, in both the top and the bottom processes. The 
rest of the transpositions are "label-to-labcl" , and as such, will not change the picture. If 
we do not adjust this coupling, the waiting time to cancel the two discrepancies will average 

2 

which is too large, so we are going to modify the coupling construction. 
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We take one of the two uncoupled cards, for example [a]. There is a random site ii, such that 
•C [a], ii ^ is bound to happen, with respect to group invariance, at a certain exponential 
time ti. If ii = di or d2, the discrepancies disappear, and the coupling is complete. If ii ^ di 
or d2, we will associate sites as if the transposition <^ [al, ii ^ has already occurred. That 



is, up until time ti, we identify site ii in At with site di in Bt, and call it ii/di, and we 
associate site d2 in At with site ii in Bt, and call it ^2/^1. Finally we identify site di in At 
and site d2 in 5^, and call it di/d2- Thus, we will get the following association map with 
respect to transposition -C «i ^ and jump time ti: 



At: 
Bt: 



6 



4 6 





b 


9 




al 




8 


a 




al 


9 




b 




8 


a 




T 






T 




T 


d2/ii 




ii/d 


1 


di/d2 



(1) 



At time ti of the association map, the three sites are renamed according to the following 
association map rule 



di/d2 - 




ii/di — 




o?2Ai — 


-^d2 



From our new perspective, time ti is the time when the location names change according 
to the above law. We will say that the association map expires at ti. 



We use the above association map to determine the rates for the coupled process in the 
time interval (0,ti]: 



Rates for [al: the first jump <^ [a], ii ^ occurs at time ti. 



Rates for \ al | : | al | does label-to- location jumps, where locations are defined by the 
association map ([1]). Transpositions <C al , i ^ occur for all i except i = ii,di, or d2, 



with the usual rate of Transpositions <^ al , ii/di ^ and ^ al , d2/ii ^ occur 
with rate ^ as well. Transposition <^ al ,di/d2 ^ is label-to- label, hence we do not 
count it, and do not relabel. 



Rates for [bj: transpositions <^ 
well as transpositions < 



b , di ^ and <^ b , rfi ^ of the coupled process, as 



b , al >A of the top process At, and < b , al >b of the 



top process Bt, occur independently with rate ^ each. The rest of the transpositions 



initiated by \ hj are label-to-label transpositions of the coupled process, for which we 
do not need to relabel. After one of the six transpositions that cancel discrepancies on 



< 



b , al >A, and < b 



< 


al 


al 





occurs, 



al I , d2/ii >, < \h], di >, < [bJ, c?i > 



will only be allowed to do label-to-label 



jumps. 
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For the rest of the cards, we apply label-to-label transpositions of the coupled process. 



Since we classified (and therefore excluded) -C al , di/d2 ^ as a label-to-label transposition, 
the jump time ^2 for the card al is exponential( - ■ ). Now, t2 < ti with probability 



n-l 



^ „_i — 2n-i- ^^^^ case, transpositions <^ \ al | , ii/di ^ and <C | al | , ^2/^1 ^ cancel the 

Ti 

discrepancies on the association map ([I]). That happens with probability Thus, the 

probability of canceling the discrepancy by time ti is 

1 



2 

n 



n 



+ P[tb < ti] 1 



n 



n 



where ^ (2 — ^j;^) = f + (l ~ f ) ' ^^~[ ■ is the probability that the discrepancy is canceled 
as the result of the jump initiated by 



at ti or al at ^2 <ti-, and if this does not happen. 



then with probability P\t[, < ti] 



4 



-, the discrepancy is canceled with the jump 



of b 

< 



Here tt is the first time one of the four transpositions <^ _b_,(ii <^ h_,di 
al >A and < _b_, al >b occurs, so tb is exponential(^). 



So, with probability ~ -, the discrepancies cancel by time ti. If they do not cancel, we 

2 

repeat the association trick, thus coupling the two discrepancies in approximately ^ steps 

2 

on average, instead of We only used one association map at a time, however. 



Example. We will illustrate how a cancellation of discrepancies on an association map 
implies discrepancies' cancelation at time ti. Consider the case when t2 < ti, and ^2 = '^2- 
Then we will observe the following evolutions on the association map: 
The configurations will evolve from 



A: 
Bt: 



6 





b 




9 




al 




8 


a 




al 




9 




b 




8 


a 




T 








T 




T 


d2l%\ 




iijd 


1 


di/d2 



to 



at time ^2? and 



A: 
Bt: 



At: 
Bt: 



6 al 9 



T 

C?2 



6 al 9 



T 

dx 



a 



T 





al 




9 




b 




8 


a 


2 




al 




9 




b 




8 


a 


2 




T 








T 




T 




d2l%\ 




ii/di 


di/d2 





at time ti. Recall that = d/d2, d\ = ii/di and d^ = ^2/^1 before ti. 
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The above were the transformations one would see on the association map. The cor- 
responding evolutions of the decks with respect to the original site associations will be as 
follows. From 



to 



at time t2, and to 



at time t 



At 
Bt 



At 
Bt 



At: 
Bt: 



6 



6 



T 

d2 



4 6 al 9 



T 

0?2 



6 al 9 



6 al 9 



T 

d2 



9 



9 



T 
di 



T 

di 



T 

di 



8 al 



8 al 



T 

ii 



8 al 



T 

ii 



T 

ii 



4.4.2 Two discrepancies and en association maps {d = 2, k = [^^J). 

Suppose we use one association map ([T]), as above. In the new picture, we again have two 
discrepancies, and we can start by trying to cancel them without waiting for time ti. We do 



this by considering transposition -C | al | , i2 where i2 is a randomly selected site on the 
new scheme below. Important: The scheme ([T]) doesn't change after ti, only the location 
names do. So, if i2 was equal to ii/di before ti, then i2 will be equal to di after ti. 



At: 
Bt: 



a2 6 



9 al 



a2 6 al 9 



a 



T 

«2 



T 

^2 



T 

di 



T 



Here di denotes ii/di before ti, and di after ti. Similarly d2 denotes d2/ii before ti, and d2 
after ti, and ii denotes di/d2 before ti, and ii after ti. 



Note: If i2 = ii in the above example, then 



At: 
Bt: 



6 



6 al 9 



T 

d2 



9 al 



T 

di 



«2 



with respect to the first association map. We do not interchange labels [a] and al . Rather 



we reselect i2 and wait one more exponential(^) time for <C al , i2 ^ to occur, up until the 
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first time, ^ al , ^2 ^ with ^2 ^ {1? 2, ... , Tiiyiii} rings. In other words, the jump time ^2 
for -C al , 12 ^ with Z2 G {1, 2, . . . , n}\{ii} is exponential(^ ■ [1 — ^]). 

Once again, on the new scheme, if ^2 = di or (i2, the discrepancies cancel, and the process 
couples after time T2 = max{ti,t2}- Also, if ^2 7^ di or ^2, we will construct one more 
association map, identifying site 12 in At with site di in i?f and calling it 12/ di, identifying 
site d2 in At with site 12 in i?^ and calling it c?2/«2, and last, identifying site di in and site 
d2 in and calling it di/d2. Thus after the second round, we get the following association 
map 



At: 
Bt: 



al 




6 




b 




9 




a2 




8 


a 


2 


al 




6 




a2 




9 




b 




8 


a 


2 


T 








T 








T 




T 




di/d 


2 




d2/i 


2 




i 


2/d 


1 







(2) 



Again we have two discrepancies, and we consider ^ | a2 | , ^3 ^ for random 23 G {1, 2, ... , ^2}, 
where di denotes 12/ di before ^2, and di after ^2, (^2 denotes ^2/^2 before t2 5 and di after ^2 
and 12 denotes di/d2 before ^2, and di after ^2- 



A: 
Bt: 



al 




6 




b 




9 




a2 




8 


a 


a3 


al 




6 




a2 




9 




b 




8 


a 


a3 


T 




T 




T 


T 


T 


«2 




d2 






ii 





Note: i-i is a location with respect to the association map ([2]) 



The waiting time ^3 for <^ a2 , 23 3> is exponential(^ ■ [1 — ^]). Each time we either cancel 
the discrepancies with probability ^-j+i ' construct one more association map. 



We will construct a chain of at most k = \_en\ association maps like that, where e G (0, 1) 
is fixed. Observe: For all cards other than 



al , a2 



ak and b 



w.r.t. group 



invariance, all jumps are label-to-label transpositions 
each aj 



For cards 



al , a2 



ak 



does label-to-location jumps with respect to the j-th association map. 
Let Ta be the first time one of the following transpositions occur s on the corresponding 
,c/i >, < 



association map: <^ 
1 k. 



,d2 >, ^ aj , di > and <^ aj , 6/2 > for all j 



The k = [enj association maps also define the rates for |_b 
will cancel discrepancies with transpositions < ^2 



for each j G {1, . . . , k}, 



we 



>A and < di. 



>B occurring in- 
dependently until a discrepancy is canceled, all with respect to j-th association map. The 
k association maps will, on average, expire in less than n log (en) units of time. In other 
words, card _b_ (w.r.t. group invariance) makes jumps that are label-to-label transpositions 

For 



b_,[c] ^ if the card [c] is not 



transpositions 
< b ,[^ >A, < b 



or 



al 



or 



>B,< 



al 



>A,< 



a2 



al 



ak 



>B, 



< 



al , a2 



>A,< 



ak 



>B 
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iviij^iig Limes via supci-icust uuupiiiig 



(with respect to j-th association map) occur independently until a discrepancy is canceled. 



Recall k = \_en\. We let T5 be the first time of a jump from one of the discrepancy 



locations, di or ^2, to some I aj I in the top, or the bottom, or both processes. Then Tj, is 

exponential ( '^^'^t^^ ) . 

So, the average time for the discrepancy cancelation over all k association maps is = 
min{Ta,T6} with 

n 

If we cancel the discrepancy on one of the k association map, then the coupling completes 
at time = max{fi, ^2, ■ ■ ■ , tk\-i where for each j' e {1, 2, . . . , A;}, 

tj is exponential(^ ■ [1 — ^^])- 

For n large enough, 

E[rk] < 1 r-^_ fc^. logfe < Y^logM. 
The k association maps will, on average, expire in less than log {en) + ^ units of time. 



4.5 The case of d >2 discrepancies and k = [snj association maps. 

denote all the discrep- 



Let di,d2 ■ ■ ■ ,dd denote all the discrepancies, and bi 
ancy cards. 

Af. .. 
Bt: .. 



4 6 b2 9 b8 b3 b4 bl 



6 b8 9 bl b4 b2 b3 



T 

di 



T 

d2 



T 

ds 



t 

d^ 



T 

^5 



In the case of d > 2 discrepancies, each discrepancy card bi 
chain of 



will have a 



Y association maps attached to it, adding new association maps at expiration 
times. See the preceding paragrap hs for the construction of chains of association maps. Each 
chain will determine the rates for bm , I ami I, fami" 



A transposition <C| a^j \,di ^ will 



cancel discrepancies on the j-th association map, in the m-th chain of association maps. So, 
for each m from 1 to rf, a transposition ^ bm , imi ^ is expected to take place at time 



tmi, and an association map is created. Transposition -Cl a^ni Um2 where (^m2 7^ imi) 



is a location with respect the association map, is expected to take place at time tjn2, and 
a new association map is created, so as we continue, creating, for each m e {1, . . . ,d} a 

chain of [^J or [^J + 1 association maps. Since all locations i„y corresponding to the 
k = lsn\ association maps arc different, for a given fixed k, G {e, 1), if we keep the number 
of association maps below nn at all times, then each expiration time tmj will have a Poisson 
exponential rate below 



l-K 

n 
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Consi der fo r instance the j'-th association map with respect to the m-th chain of association 

. . . (here j is between 1 and ^): 



maps, 



ami 



am2 



A: 
Bt: 



6 



b2 



9 



a 



mj 



b3 b4 bl 



bl b4 b2 b3 



T 



T 

d2 



T 



T 

d4 



T 

^5 



Here, each I I does label-to- location jumps according to the above j-th association map. 
Each time an association map expires, we construct a new one. So each time we have 
k — \_en\ association maps. Thus, the time of cancellation using one of the following 
transpositions 



< I a^j I , > : {i,m,j) e {l,...,d} X {l,...,d} X 



or < 



, ^ is exponential(^^^^ ). 



We will also cancel discrepancies with ^ ^3, 

amj >A and < ^2 



>, 



a 



mj 



> 



and two transpositions < di. 



a 



mj 



>B occurring independently until a 



discrepancy is cancele d. Al l with respect to j-th association map in the chain of association 
maps that starts with b^ • 

(excluding 



This will not disturb the rates of 



bl 



), because 



will 



not occupy any of the discrepancy locations until the time of discrepancy cancelation. 

Wc let T5 again be the first time of a jump from one of the discrepancy locations di to 
in the top, or the bottom, or both processes. Again, T^, is exponential (^^^^^^^ ) . 



some 



a 



mj 



Therefore, the average time for the discrepancy cancelation over all k association maps 
is Td = min{Ta, T^} with 



E[Ta] < 



< 



n 



2{k + l)d - 2ed' 



When a discrepancy is cancelled, after ^ units of time (on average), we will reuse d — 1 
chains of association maps. So at most, only en/d association maps will not be reused. 

Every time we cancel one of d discrepancies, we are left with an average of | ■ ^ non- 
reusable association maps in the corresponding chain as chain of association maps has length 
~ There the discrepancy is canceled on one of them, and all the association maps built 
on top of it can be forgotten, while the average number of | • ^ of the remaining association 
maps cannot be reused to cancel more discrepancies. 

If we impose an upper bound {e < k < 1) on the number of association maps allowed 
at any one time, then the average time of discrepancy cancelation will be delayed by at most 

• °g + (^ _ e)d J - {l-K){K-e) ' d ■ 
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iviij^iig Limes via supci-icust uuupiiiig 



Thus it will take less than 



1 £ 



nlogn 



to cancel every discrepancy, each w.r.t. some association map. 

After the discrepancies cancel with respect to the association maps, it will take an average 
of about log {nn) units of time for all remaining association maps to expire. Thus the 
upper bound on the expectation of coupling time will be 



1 K 
Ye^ (1 



nlogn for any < £ < k < 1. 



We obtained a Cn log n upper bound on coupling time with optimal C < 6. We do, however, 
believe that improving the above construction to produce ^n logn upper bound is achievable. 



4.5.1 Computations. 

Let the time interval it takes to cancel one of the d discrepancies on one of the k — [enj 
association maps be denoted by J^. Then = T^^, where E\Td\ — We want to impose 
an upper bound nn {e < n < 1) on the number of association maps allowed at any one 

for the association map corresponding to 



will 



^ of them 

a 



time, then each expiration time tmj 
have Poisson exponential rate > Let Qd denote the number of non-reusable association 
maps at the start of Id time interval. Then Qd < {i^ — Let us bound an expectation of 
the length of time interval Jd (with the same starting point as Id) requiring enough of Qd 
non-reusable association maps to expire, so that there are less than [k — e)n — 
left . We let 

/c* = max{0, \_Qd — {n — e — e/d)n\ + 1} 

be the number of expirations of Qd non-reusable association maps to occur within Jd time 
interval. Then by coupon collector argument, the interval's length \ Jd\ is dominated by the 
sum of independent exponential random variables with parameters 



Qd 



1 - K 



n 



{Qd - 1) 



1 - K 



n 



, {Qd -k, + l) 



1-K 



n 



Thus 



EWJd 



< 




log 



Qd 



Qd ki, 



n 



K, — e)d J {l — K,){K — e) d 



So only after max{|/d|, \ Jd\} units of time will we have the discrepancy canceled on one of 
the association maps, and the total number of non-reusable association maps smaller than 
{k — e)n. Only then will we be able to start the new round Id-i (it can also be Id-2 or Ids 
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if more than one discrepancy is cancelled at once) of discrepancy cancelations with respect 
to association maps. Here 



E[max{|/,|,|J,|}] <^[|/,|] + ^[|J,|] < 
It will take an average of 



1 

— + 



n 



J]E[max{|J,|,|J,|}] < 



1 



_2e {l-K){K-e)\ d 



n \ogn 



2e (l-/t)(/t-e)_ 



units of time in order to cancel all the discrepancies with respect to the association maps 
while keeping the number of non-reusible association maps under (1 — K)n. 

References 

ill D.Aldous, Random Walks on Finite Groups and Rapidly Mixing Markov Chains Semi- 
naire de Probabilites XVII, Lecture Notes in Math. 986 (1983) 243-297 



[2] 
[3] 
[4] 
[5] 
[6] 
[7] 



[9 

[lo: 



D.Aldous and J. A. Fill, Reversible Markov Chains and Random Walks on Craphs 



[http : / / www, st at .berkeley.edu / users / aldous / 



N.Berestycki and R.Durrett, A phase transition in the random transposition random 
walk Probability Theory and Related Fields 136 (2006), no.2, 203-233 

P.Diaconis, Croup Representations in Probability and Statistics Institute of Mathemat- 
ical Statistics, Hayward CA (1973) 

P.Diaconis, Random Walk on Croups: Ceometry and Character Theory Groups St. 
Andrews, 2001. Neuman et al, ed. Cambridge: Cambridge University Press (2003) 

P.Diaconis and L.Saloff-Coste, Comparison Theorems for Reversible Markov Chains 
Ann. Prob. 21 (1993), no.4, 2131-2156 

P.Diaconis and M.Shahshahani, Generating a random permutation with random trans- 
positions Z. Wahrsch. Verw. Gebiete 57 (1981), 159-179 

T. Hayes and E. Vigoda, A Non-Markovian Coupling for Randomly Sampling Colorings. 
Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science 
(2003), 618-627 

C.Hoffman and D.Rudolph, Uniform Endomorphisms which are isomorphic to a 
Bernoulli shift Annals of Mathematics 156 (2002), no.l, 79-101 

P.Matthews, A strong uniform time for random transpositions Journal of Theoretical 
Probability 1 (1988), no.4, 411-423 



19 



l\..iVl.UUl lUli, 1 . V .IVUVUliCgUV 



[11] Y.Peres, Mixing for Markov Chains and Spin Systems. 

|http:/ /www.stat.berkeley.edu/users/peres/ubc.pdf| 

[12] L.Saloff-Coste, Random Walks on Finite Groups Probability on Discrete Structures, 
H.Kesten (Editor), Springer (2004), 263-346 

[13] V.A.Volkonskii and Yu. A. Rozanov, Some limit theorem for random functions. Theory 
of Probability and Its Applications (4) (1959), 178-197. 



20 



